The impact of language dynamics on the capitalization of broadcast news
نویسندگان
چکیده
This paper investigates the impact of language dynamics on the capitalization of transcriptions of broadcast news. Most of the capitalization information is provided by a large newspaper corpus. Three different speech corpora subsets, from different time periods, are used for evaluation, assessing the importance of available training data in nearby time periods. Results are provided both for manual and automatic transcriptions, showing also the impact of the recognition errors in the capitalization task. Our approach is based on maximum entropy models, uses unlimited vocabulary, and is suitable for language adaptation. The language model for a given language period is produced by retraining a previous language model with data from that time period. The language model produced with this approach can be sorted and then pruned, in order to reduce computational resources, without much impact in the final results.
منابع مشابه
Recovering Capitalization and Punctuation Marks on Speech Transcriptions
This work addresses two metadata annotation tasks, involved in the production of rich transcripts: automatic capitalization, and punctuation marks recovery. The main focus concerns broadcast news, using both manual and automatic speech transcripts. Different capitalization models were analysed and compared, and results support the ideia that generative approaches capture the structure of writte...
متن کاملRecovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news
The following material presents a study about recovering punctuation marks, and capitalization information from European Portuguese broadcast news speech transcriptions. Different approaches were tested for capitalization, both generative and discriminative, using: finite state transducers automatically built from language models; and maximum entropy models. Several resources were used, includi...
متن کاملA Study on News Anchors’ Meta-Language and Non-Verbal Factors and their Impact on Audiences
Non-verbal communication or body messaging occurs when facial expressions, tone of voice, head and neck movements, smiling and ... affects others; which may be intentional or unintentional. Farhangi in nonverbal communication: the art of using movement and sound” defines this field as such: "Non-verbal communication is phonetic and non-phonetic messages which have been explained by other than l...
متن کاملAutomatic Recovery of Punctuation Marks and Capitalization Information for Iberian Languages
This paper shows experimental results concerning automatic enrichment of the speech recognition output with punctuation marks and capitalization information. The two tasks are treated as two classification problems, using a maximum entropy modeling approach. The approach is language independent as reinforced by experiments performed on Portuguese and Spanish Broadcast News corpora. The discrimi...
متن کاملThe Impact of Corporate income Tax and Firm Size on Fixed Investment
This paper is an attempt to analyze the impact of income taxes and market capitalization on fixed investment (investment in tangible assets) by manufacturing companies listed on KSE. This paper basically examines that how corporate income taxes affect fixed investment by reducing cash flow available for a firm to invest and how the firm size in the lights of market capitalization affects fixed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008